Example structure of a “good” guide
## Warning: Removed 1 rows containing non-finite values (stat_ydensity).
Per guide: * Per timepoint: t-test for difference in signal between RNP-only and 100 fM conditions per timepoint * Perform FDR correction for number of measurements from start of experiment to timepoint * Return first timepoint for which corrected p-value < 0.05
Structures of the two guides that performed well (rate > 1 above background) but without hairpin structure (NCR_1346, NCR_1351):
Determination of how much of predicted hairpin structure needs to be maintained:
## NCR.id spacer
## 120 NCR_1313 GUUUACCUUGGUAAUCAUCU
## 126 NCR_1319 UCAUUAAAUGGUAGGACAGG
## 137 NCR_1330 GCAAUCAAUGGGCAAGCUUU
## 138 NCR_1331 CUUCUCUGUAGCUAGUUGUA
## 139 NCR_1332 GAGUAAAUCUUCAUAAUUAG
## 142 NCR_1335 AUGGUGUCCAGCAAUACGAA
## 143 NCR_1336 GCCGUCUUUGUUAGCACCAU
## 155 NCR_1348 AUUAGCUCUCAGGUUGUCUA
## 156 NCR_1349 UGGUACGUUAAAAGUUGAUG
## 158 NCR_1351 UGGCUACUUUGAUACAAGGU
## 21685 NCR_1410 UGAAUGUAAAACUGAGGAUCUGAAAACU
## 9671 NCR_1412 UAUAAGCAAUUGUUAUCCAGAAAGGUAC
## 10691 NCR_1417 GAUUGAGAAACCACCUGUCUCCAUUUAU
## structure
## 120 ...((((((((.........))))................))))........
## 126 ...((((((((.........))))................))))........
## 137 .......(((((....(((...((........))..))).))))).......
## 138 ..(((.(((........(((((.........)))))......))).)))...
## 139 ...............(((((((..(((......)))...)))))))......
## 142 ................((..((((((((.....))))))))..)).......
## 143 ..............((((((((((........)))..)))))))........
## 155 (((((.((((......(((.((((............))))))))))))))))
## 156 ...((((((((.........))))........)))).((((((...))))))
## 158 (((.(((((((.........))))........))))))(((((...))))).
## 21685 ((((((.((((.........))))......((.....))........)).))))......
## 9671 ...(((.((((.........))))............(((....))).........)))..
## 10691 ...............(((((((((((......................)))))).)))))
## Warning in cor.test.default(GC_content, Estimate, method = "spearman"): Cannot
## compute exact p-value with ties
## Warning in cor.test.default(GC_content, Estimate, method = "spearman"): Cannot
## compute exact p-value with ties
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
## Warning in cor.test.default(downstream_U, Estimate, method = "spearman"): Cannot
## compute exact p-value with ties
## Warning in cor.test.default(downstream_U, Estimate, method = "spearman"): Cannot
## compute exact p-value with ties
## Warning in cor.test.default(downstream_unstructured_U, Estimate, method =
## "spearman"): Cannot compute exact p-value with ties
## Warning in cor.test.default(downstream_unstructured_U, Estimate, method =
## "spearman"): Cannot compute exact p-value with ties
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
## Warning in cor.test.default(gRNA_MFE, Estimate, method = "spearman"): Cannot
## compute exact p-value with ties
## Warning in cor.test.default(gRNA_MFE, Estimate, method = "spearman"): Cannot
## compute exact p-value with ties
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
## Warning: Removed 1 rows containing missing values (geom_col).
## Likelihood ratio test
##
## Model 1: rate ~ spacer_1_A + spacer_1_C + spacer_1_G + spacer_1_U + spacer_2_A +
## spacer_2_C + spacer_2_G + spacer_2_U + spacer_3_A + spacer_3_C +
## spacer_3_G + spacer_3_U + spacer_4_A + spacer_4_C + spacer_4_G +
## spacer_4_U + spacer_5_A + spacer_5_C + spacer_5_G + spacer_5_U +
## spacer_6_A + spacer_6_C + spacer_6_G + spacer_6_U + spacer_7_A +
## spacer_7_C + spacer_7_G + spacer_7_U + spacer_8_A + spacer_8_C +
## spacer_8_G + spacer_8_U + spacer_9_A + spacer_9_C + spacer_9_G +
## spacer_9_U + spacer_10_A + spacer_10_C + spacer_10_G + spacer_10_U +
## spacer_11_A + spacer_11_C + spacer_11_G + spacer_11_U + spacer_12_A +
## spacer_12_C + spacer_12_G + spacer_12_U + spacer_13_A + spacer_13_C +
## spacer_13_G + spacer_13_U + spacer_14_A + spacer_14_C + spacer_14_G +
## spacer_14_U + spacer_15_A + spacer_15_C + spacer_15_G + spacer_15_U +
## spacer_16_A + spacer_16_C + spacer_16_G + spacer_16_U + spacer_17_A +
## spacer_17_C + spacer_17_G + spacer_17_U + spacer_18_A + spacer_18_C +
## spacer_18_G + spacer_18_U + spacer_19_A + spacer_19_C + spacer_19_G +
## spacer_19_U + spacer_20_A + spacer_20_C + spacer_20_G + spacer_20_U
## Model 2: rate ~ spacer_1_A + spacer_1_C + spacer_1_G + spacer_1_U + spacer_2_A +
## spacer_2_C + spacer_2_G + spacer_2_U + spacer_3_A + spacer_3_C +
## spacer_3_G + spacer_3_U + spacer_4_A + spacer_4_C + spacer_4_G +
## spacer_4_U + spacer_5_A + spacer_5_C + spacer_5_G + spacer_5_U +
## spacer_6_A + spacer_6_C + spacer_6_G + spacer_6_U + spacer_7_A +
## spacer_7_C + spacer_7_G + spacer_7_U + spacer_8_A + spacer_8_C +
## spacer_8_G + spacer_8_U + spacer_9_A + spacer_9_C + spacer_9_G +
## spacer_9_U + spacer_10_A + spacer_10_C + spacer_10_G + spacer_10_U +
## spacer_11_A + spacer_11_C + spacer_11_G + spacer_11_U + spacer_12_A +
## spacer_12_C + spacer_12_G + spacer_12_U + spacer_13_A + spacer_13_C +
## spacer_13_G + spacer_13_U + spacer_14_A + spacer_14_C + spacer_14_G +
## spacer_14_U + spacer_15_A + spacer_15_C + spacer_15_G + spacer_15_U +
## spacer_16_A + spacer_16_C + spacer_16_G + spacer_16_U + spacer_17_A +
## spacer_17_C + spacer_17_G + spacer_17_U + spacer_18_A + spacer_18_C +
## spacer_18_G + spacer_18_U + spacer_19_A + spacer_19_C + spacer_19_G +
## spacer_19_U + spacer_20_A + spacer_20_C + spacer_20_G + spacer_20_U +
## structure_1_. + structure_1_structured + structure_1_unstructured +
## structure_2_. + structure_2_both + structure_2_structured +
## structure_2_unstructured + structure_3_. + structure_3_both +
## structure_3_structured + structure_3_unstructured + structure_4_. +
## structure_4_both + structure_4_structured + structure_4_unstructured +
## structure_5_. + structure_5_both + structure_5_structured +
## structure_5_unstructured + structure_6_. + structure_6_both +
## structure_6_structured + structure_6_unstructured + structure_7_. +
## structure_7_both + structure_7_structured + structure_7_unstructured +
## structure_8_. + structure_8_both + structure_8_structured +
## structure_8_unstructured + structure_9_. + structure_9_both +
## structure_9_structured + structure_9_unstructured + structure_10_. +
## structure_10_both + structure_10_structured + structure_10_unstructured +
## structure_11_. + structure_11_both + structure_11_structured +
## structure_11_unstructured + structure_12_. + structure_12_both +
## structure_12_structured + structure_12_unstructured + structure_13_. +
## structure_13_both + structure_13_structured + structure_13_unstructured +
## structure_14_. + structure_14_both + structure_14_structured +
## structure_14_unstructured + structure_15_. + structure_15_both +
## structure_15_structured + structure_15_unstructured + structure_16_. +
## structure_16_both + structure_16_structured + structure_16_unstructured +
## structure_17_. + structure_17_both + structure_17_structured +
## structure_17_unstructured + structure_18_. + structure_18_both +
## structure_18_structured + structure_18_unstructured + structure_19_. +
## structure_19_both + structure_19_structured + structure_19_unstructured +
## structure_20_. + structure_20_both + structure_20_structured +
## structure_20_unstructured
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 62 -792.28
## 2 106 -754.14 44 76.281 0.001816 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Model 1: sequence + structure
Model 2: only sequence
Model 3: only structure
Model 4: only sequence (binary)
Model 5: sequence (binary) + structure
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
Model 6: rate ~ (antitag position 1) * (spacer structure) + (downstream unstructured U)
##
## Call:
## glm(formula = Estimate ~ ., family = "gaussian", data = subset(model6_comparison_data_onehot,
## nchar(spacer) == 20, select = -spacer))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -36.999 -11.676 -1.541 8.462 57.903
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.80315 2.66084 10.449 < 2e-16 ***
## antitag_pos1_A 0.06504 3.11007 0.021 0.9833
## antitag_pos1_C 0.14718 3.54166 0.042 0.9669
## antitag_pos1_G -18.74385 3.50236 -5.352 2.56e-07 ***
## antitag_pos1_U NA NA NA NA
## downstream_unstructured_U -17.98711 14.22614 -1.264 0.2077
## spacer_structure 8.61017 4.99932 1.722 0.0867 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 277.927)
##
## Null deviance: 63703 on 190 degrees of freedom
## Residual deviance: 51416 on 185 degrees of freedom
## AIC: 1624.8
##
## Number of Fisher Scoring iterations: 2
##
## Call:
## glm(formula = Estimate ~ ., family = "gaussian", data = subset(model6_comparison_data_onehot,
## select = -spacer))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -39.018 -11.903 -1.100 8.999 54.453
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.7728 2.6271 10.572 < 2e-16 ***
## antitag_pos1_A -0.1324 3.0318 -0.044 0.9652
## antitag_pos1_C 0.4220 3.4438 0.123 0.9026
## antitag_pos1_G -17.6095 3.4210 -5.147 6.37e-07 ***
## antitag_pos1_U NA NA NA NA
## downstream_unstructured_U -15.0276 13.6651 -1.100 0.2728
## spacer_structure 11.2954 4.8354 2.336 0.0205 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 279.9275)
##
## Null deviance: 67347 on 202 degrees of freedom
## Residual deviance: 55146 on 197 degrees of freedom
## AIC: 1727.8
##
## Number of Fisher Scoring iterations: 2
##
## Call:
## glm(formula = (Estimate > 20) ~ ., family = "binomial", data = subset(model6_comparison_data_onehot,
## nchar(spacer) == 20, select = -spacer))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7380 -1.3004 0.7803 0.9779 2.1603
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.59673 0.33701 1.771 0.0766 .
## antitag_pos1_A -0.10741 0.38788 -0.277 0.7818
## antitag_pos1_C -0.01259 0.44569 -0.028 0.9775
## antitag_pos1_G -2.68419 0.59822 -4.487 7.22e-06 ***
## antitag_pos1_U NA NA NA NA
## downstream_unstructured_U -1.44102 1.81316 -0.795 0.4268
## spacer_structure 0.84591 0.69198 1.222 0.2215
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 264.15 on 190 degrees of freedom
## Residual deviance: 225.87 on 185 degrees of freedom
## AIC: 237.87
##
## Number of Fisher Scoring iterations: 4
##
## Call:
## glm(formula = (Estimate > 20) ~ ., family = "binomial", data = subset(model6_comparison_data_onehot,
## select = -spacer))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8540 -1.3039 0.7348 0.9664 2.0093
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.59537 0.33264 1.790 0.0735 .
## antitag_pos1_A -0.19837 0.38016 -0.522 0.6018
## antitag_pos1_C -0.00632 0.43958 -0.014 0.9885
## antitag_pos1_G -2.39508 0.52383 -4.572 4.83e-06 ***
## antitag_pos1_U NA NA NA NA
## downstream_unstructured_U -0.76374 1.73318 -0.441 0.6595
## spacer_structure 1.16501 0.67537 1.725 0.0845 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 279.24 on 202 degrees of freedom
## Residual deviance: 242.82 on 197 degrees of freedom
## AIC: 254.82
##
## Number of Fisher Scoring iterations: 4
Model 7: reduced features
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
Top guides:
Bottom guides:
## Warning in eval(substitute(expr), data, enclos = parent.frame()): NAs introduced
## by coercion
## Warning: NAs introduced by coercion
## [1] "mixed model failed: NCR_1320"
## [1] "mixed model failed: NCR_1332"
## [1] "mixed model failed: NCR_1387"
## Warning: Removed 26 rows containing non-finite values (stat_smooth).
## Warning: Removed 26 rows containing missing values (geom_point).
## Warning: Removed 5 rows containing missing values (geom_smooth).
gBlock round 2 outlier:
Figure 1A (data): guide design pipeline
Figure 2A: range of observed guide activities
## Warning: Removed 2 rows containing missing values (geom_bar).
## Warning: Removed 2 rows containing missing values (geom_bar).
Figure 2B: example traces
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
Figure 2C: viral RNA v. gblock
Figure 3: elastic net regression + anti-tag result
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
##
## Welch Two Sample t-test
##
## data: subset(guide_rate$Estimate, guide_rate$antitag_pos1 != "G") and subset(guide_rate$Estimate, guide_rate$antitag_pos1 == "G")
## t = 7.1627, df = 66.208, p-value = 4.095e-10
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 14.64215 Inf
## sample estimates:
## mean of x mean of y
## 27.452336 8.364669
##
## Welch Two Sample t-test
##
## data: subset(guide_rate$Estimate, guide_rate$antitag_label == "G") and subset(guide_rate$Estimate, guide_rate$antitag_label %in% c("GU", "GUU"))
## t = 2.4884, df = 29.681, p-value = 0.009337
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 2.606595 Inf
## sample estimates:
## mean of x mean of y
## 9.916849 1.712470
Figure 4C: LOD with Cas13-Csm6 tandem assay
Figure 4D: robustness to genetic variants
Suppl. Figure 1A: random forest variable importance
Suppl. Figure 1B: sequence logo
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
Suppl. Figure 2A: GC content
Suppl. Figure 2B: hybridization MFE
Suppl. Figure 2C: cleaveable U in target context
Suppl. Figure 3A: spacer structure
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: Groups with fewer than two data points have been dropped.
## Warning: Groups with fewer than two data points have been dropped.
## Warning: Groups with fewer than two data points have been dropped.
## Warning: Groups with fewer than two data points have been dropped.
## Warning: Groups with fewer than two data points have been dropped.
## Warning: Groups with fewer than two data points have been dropped.
## Warning: Groups with fewer than two data points have been dropped.
## Warning: Groups with fewer than two data points have been dropped.
Suppl. Figure 3B: structure of direct repeat
Suppl. Figure 4A: in vivo viral structure
Suppl. Figure 4B: genomic structure vs. rate
Suppl. Figure 5A: multiplex set of 40 vs. primary screen
Suppl. Figure 5B: leave-one-out counterscreen
Suppl. Figure 5C: human RNA counterscreen
## Warning in sort(as.numeric(guide)): NAs introduced by coercion
## Warning in sort(as.numeric(guide)): NAs introduced by coercion
## Warning in sort(as.numeric(guide)): NAs introduced by coercion
## Warning in sort(as.numeric(guide)): NAs introduced by coercion
## Warning in sort(as.numeric(guide)): NAs introduced by coercion
## Warning in sort(as.numeric(guide)): NAs introduced by coercion
## Warning in sort(as.numeric(guide)): NAs introduced by coercion
## Warning in sort(as.numeric(guide)): NAs introduced by coercion
Suppl. Figure 6: 32-pool vs. 8-pool w/ forced mismatch
## Warning: Ignoring unknown aesthetics: fill
gblock rates
## Warning: Removed 5 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_bar).
anti-tag complementarity
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
interaction btwn anti-tag G and spacer structure
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: Groups with fewer than two data points have been dropped.
## Warning: Groups with fewer than two data points have been dropped.